Independence Assumptions Considered Harmful
نویسنده
چکیده
Many current approaches to statistical language modeling rely on independence a.~sumptions 1)etween the different explanatory variables. This results in models which are computationally simple, but which only model the main effects of the explanatory variables oil the response variable. This paper presents an argmnent in favor of a statistical approach that also models the interactions between the explanatory variables. The argument rests on empirical evidence from two series of experiments concerning automatic ambiguity resolution. 1 I n t r o d u c t i o n In this paper, we present an empirical argument in favor of a certain approach to statistical natural language modeling: we advocate statistical natural language models that account for the interactions between the explanatory statistical variables, rather than relying on independence a~ssumptions. Such models are able to perform prediction on the basis of estimated probability distributions that are properly conditioned on the combinations of the individual values of the explanatory variables. After describing one type of statistical model that is particularly well-suited to modeling natural language data, called a loglinear model, we present einpirical evidence fi'om a series of experiments on different ambiguity resolution tasks that show that the performance of the loglinear models outranks the performance of other models described in the literature that a~ssume independence between the explanatory variables. 2 S t a t i s t i c a l L a n g u a g e M o d e l i n g By "statistical language model", we refer to a mathematical object that "imitates the properties" of some respects of naturM language, and in turn makes predictions that are useful from a scientific or engineering point of view. Much recent work in this flamework hm~ used written and spoken natural language data to estimate parameters for statisticM models that were characterized by serious limitations: models were either limited to a single explanatory variable or. if more than one explanatory variable wa~s considered, the variables were assumed to be independent. In this section, we describe a method for statistical language modeling that transcends these limitations. 2.1 Categor i ca l D a t a Analys i s Categorical data analysis is the area of statistics that addresses categorical statistical variable: variables whose values are one of a set of categories. An exampie of such a linguistic variable is PART-OF-SPEECH, whose possible values might include nou.n, verb, determiner, preposition, etc. We distinguish between a set of explanatory variames. and one response variable. A statistical model can be used to perforin prediction in the following manner: Given the values of the explanatory variables, what is the probability distribution for the response variable, i.e.. what are the probabilities for the different possible values of the response variable? 2.2 The Cont ingency Table Tile ba,sic tool used in categorical data analysis is the contingency table (sometimes called the "crossclassified table of counts"). A contingency table is a matrix with one dimension for each variable, including the response variable. Each cell ill the contingency table records the frequency of data with the appropriate characteristics. Since each cell concerns a specific combination of feat.ures, this provides a way to estimate probabilities of specific feature combinations from the observed frequencies, ms the cell counts can easily be converted to probabilities. Prediction is achieved by determining the value of the response variable given the values of the explanatory variables.
منابع مشابه
A Bayesian Approach to Learning Causal Networks
Whereas acausal Bayesian networks represent probabilistic independence, causal Bayesian networks represent causal relationships. In this paper, we examine Bayesian methods for learning both types of networks. Bayesian methods for learning acausal networks are fairly well developed. These methods often employ assumptions to facilitate the construction of priors, including the assumptions of para...
متن کاملClustering processes
The problem of clustering is considered, for the case when each data point is a sample generated by a stationary ergodic process. We propose a very natural asymptotic notion of consistency, and show that simple consistent algorithms exist, under most general non-parametric assumptions. The notion of consistency is as follows: two samples should be put into the same cluster if and only if they w...
متن کاملOn the independence of digits in connected digit strings
One of the frequently used assumptions in Speaker Veriication is that two speech segments (phonemes, subwords, words) are considered to be independent. And therefore, the log-likelihood of a test utterance is just the sum of the log-likelihoods of the speech segments in that utterance. This paper reports about cases in which this observation-independence assumption seems to be violated, namely ...
متن کاملMCMC Sampling for a Multilevel Model With Nonindependent Residuals Within and Between Cluster Units
In this article, we discuss the effect of removing the independence assumptions between the residuals in two-level random effect models. We first consider removing the independence between the Level 2 residuals and instead assume that the vector of all residuals at the cluster level follows a general multivariate normal distribution. We demonstrate how this assumption can allow us to fit higher...
متن کاملConverging Evidence of Linear Independent Components in EEG
Blind source separation (BSS) has been proposed as a method to analyze multi-channel electroencephalography (EEG) data. A basic issue in applying BSS algorithms is the validity of the independence assumption. In this paper we investigate whether EEG can be considered to be a linear combination of independent sources. Linear BSS can be obtained under the assumptions of non-Gaussian, non-stationa...
متن کاملHarmony Assumptions in Information Retrieval and Social Networks
In many applications, independence of event occurrences is assumed, even if there is evidence for dependence. Capturing dependence leads to complex models, and even if the complex models were superior, they fail to beat the simplicity and scalability of the independence assumption. Therefore, many models assume independence and apply heuristics to improve results. Theoretical explanations of th...
متن کامل